Influence of genetic substructuring of statistical forensic parameters on genetic short tandem repeat markers in the populations of Southeastern Europe

Aim To investigate the influence of specific intrapopulation genetic structures on interpopulation relationships. Special focus was the influence of island population isolation on the substructuring of the Croatian population, and the influence of regional population groups on the substructuring of Southeast European populations. Methods Autosomal short tandem repeat (STR) loci were analyzed by using four forensic parameters: matching probability (PM), power of discrimination (PD), power of exclusion (PE), and polymorphic information content (PIC) on a sample of 2877 unrelated participants of both sexes. A sample set comprising 590 participants was analyzed for the first time, and 2287 participants were included from previous studies. The analysis was performed with PowerStats v. 1.2. Results The analysis of forensic parameters for all nine loci in the Croatian subpopulations showed the largest deviations in the populations of the islands of Korčula and Hvar. The smallest deviations were found in the mainland population. As for Southeast European populations, the largest deviations were found in the population of North Macedonia, followed by Romania, Albanians from Kosovo, and Montenegro, while the smallest deviations were found in the population of Hungary. Conclusion The comparison of forensic parameters between different subpopulations of Croatia and Southeast Europe indicates that the isolation of individual Croatian subpopulations and rare alleles in their gene pool affect the values of forensic parameters. Specific features of (sub)populations should be taken into account for appropriate sampling of the total population when creating a DNA database of STR markers.


Aim
To investigate the influence of specific intrapopulation genetic structures on interpopulation relationships. Special focus was the influence of island population isolation on the substructuring of the Croatian population, and the influence of regional population groups on the substructuring of Southeast European populations.
Methods Autosomal short tandem repeat (STR) loci were analyzed by using four forensic parameters: matching probability (PM), power of discrimination (PD), power of exclusion (PE), and polymorphic information content (PIC) on a sample of 2877 unrelated participants of both sexes. A sample set comprising 590 participants was analyzed for the first time, and 2287 participants were included from previous studies. The analysis was performed with Power-Stats v. 1.2.

Results
The analysis of forensic parameters for all nine loci in the Croatian subpopulations showed the largest deviations in the populations of the islands of Korčula and Hvar. The smallest deviations were found in the mainland population. As for Southeast European populations, the largest deviations were found in the population of North Macedonia, followed by Romania, Albanians from Kosovo, and Montenegro, while the smallest deviations were found in the population of Hungary.

Conclusion
The comparison of forensic parameters between different subpopulations of Croatia and Southeast Europe indicates that the isolation of individual Croatian subpopulations and rare alleles in their gene pool affect the values of forensic parameters. Specific features of (sub) populations should be taken into account for appropriate sampling of the total population when creating a DNA database of STR markers.
Microsatellite markers are used in forensic research due to their high power of discrimination (PD) (usually >0.9, with observed heterozygosity >70%), position at separate chromosome locations (to avoid closely related loci), consistency and reproducibility of results after multiplication with other markers, and rare occurrences of stutter products (1). The use of short tandem repeat (STR) markers in paternity testing and forensic genetic analysis increases the PE from an initial 30% to 40% for blood typing, 80% for tissue typing, HLA analysis, 90% for combinations of HLA analysis with serological tests, and finally 99.99% for RFLP analysis to a minimum of 99.999% (2)(3)(4). In forensic research, the advantage of microsatellite markers is the possibility of simultaneous analysis of multiple loci in multiplex STR systems, which allows a high degree of individualization in identifying traces. In forensic applications where DNA is usually degraded, microsatellite markers of 100 to 400 bp are better markers than minisatellites (VN-TRs) of 400 to 1000 bp (4).
Microsatellite DNA differs for each person, but two people may have the same allelic variants at one or more STR loci. However, the probability of finding allelic variants in two individuals at, for example, 15 STR loci, analyzed using the PowerPlex 16 kit for the white population is 1.83 × 10 17 (5). The average mutation rate for loci is variable, but its values are below 0.1%. Based on previous forensic research, the loci with the lowest mutation rate are CSF1PO, TH01, TPOX, D5S818, and D8S1179, while D21S11, FGA, D7S820, D16S539, and D18S51 have a significantly higher mutation rate (2,(6)(7)(8).
Specific population DNA databases serve to determine the genetic diversity of populations and to facilitate statistical calculations in forensic genetics. They are not based on individual DNA profiles, but the profiles represented in a particular population are used to determine allelic frequencies for further statistical calculations, ie, analysis of forensic identification parameters based on the assessment of PM, PD, PE, degree of loci heterozygosity (H), and likelihood ratio (LR). In order to form the most representative database of a certain population, it is important to investigate the genetic diversity of the population, its features, size, isolation, and the degree of genetic differentiation in relation to neighboring and distant populations (4).
Therefore, the aim of this study was to investigate the existence of substructured subpopulations in Croatia, and the impact of their specific intrapopulation genetic structure on interpopulation relationships, and how these relation-ships affect the basic forensic statistical parameters of genetic STR markers.

Sample
A total of 2877 unrelated participants of both sexes from the area of Southeastern Europe were analyzed. A sample set comprising 590 participants was analyzed for the first time in this study and includes the following populations: Baranja (n = 397), island of Ugljan (n = 58), island of Pašman (n = 10), island of Dugi Otok (n = 14), and Slovenia (n = 111). The islands of Ugljan, Pašman, and Dugi otok were analyzed as one population (north Dalmatian islands, NDI) due to the small number of samples and relatively small geographical distance and good connections between them. Blood samples of all participants were collected as part of field research, after participants signed the informed consent. DNA isolation was performed in the Laboratory for Molecular Anthropology of the Institute for Anthropological Research, Zagreb, and the genetic analyses of STR markers were performed in the DNA Laboratory, Genos Ltd, Zagreb.
To enable a hierarchically structured analysis, the entire sample was divided into two hierarchical groups. The first group consisted of all Croatian subpopulations: mainland (Baranja and continental Croatia) and islands: Krk, Cres, NDI, Brač, Hvar, Korčula, and Vis. The second group consisted of Southeast European populations (Montenegro, Serbia, Croatia, North Macedonia, Hungary, Bosnia and Herzegovina, Romania, Albanians from Kosovo, Greece, and Slovenia).
In order to determine the relationship between the forensic parameters of the Croatian subpopulations, a "potential" reference base of the Croatian population was simulated. Out of 1230 respondents, a sample including 45% of the participants was formed by random selection. The "potential" reference base of the population of Southeast Europe was simulated using the same principle. Out of 1805 participants, a sample for the "potential" reference base was formed including 30% of participants. The sample of Croatia, which included 1230 respondents in the hierarchical level analysis, was reduced to 158 participants to mirror their actual share among Southeast European populations for which microsatellite databases were used in the study (the ratio of continental and island populations in Croatia is 97.2% to 2.8%). Specifically, the Croatian sample included the continental part (n = 100) (Zagreb, Pazin, Delnice, Zabok, and Donji Miholjac) (9), a randomly selected sample from Baranja (N = 44), and 5 samples from Baranja with DNA profiles containing alleles characteristic only for the Croatian population. The island part contained 9 samples selected randomly or based on their unique presence in the Croatian population. In this way, the ratio of continental to island populations was 94.3% to 5.7%, presenting a slightly higher percentage in favor of the islands because of the inclusion of their specific DNA profiles without which the Croatian population base would not be representative.

StR marker analysis
DNA was isolated from whole-blood samples by using the salting out method (24), followed by polymerase chain reaction (PCR) and capillary electrophoresis. AmpFLSTR Identifiler PCR Amplification Kit (Applied Biosystems, Foster City, CA, USA) was used for STR genotyping. Fragment analysis was performed on 3130 genetic analyzer (Applied Biosystems), ABI Data Collection Software, and GeneMapper TM 3.2 Software (Applied Biosystems).

Bio-statistical analyses
Since complete data for 2877 samples were not available for all loci included in the AmpFLSTR Identifiler PCR Amplification Kit, in order to make the data comparable, bio-statistical analyses were performed for nine autosomal STR loci (D3S1358, vWA, FGA, TH01, TPOX, CSF1PO, D5S818, D13S317, D7S82).

Forensic parameters
The following forensic parameters were used in the study: i) matching probability or probability of match (pM), which shows how many people need to be searched to find the same DNA profile in a randomly selected individual; ii) discrimination power (Pd), which is calculated by subtracting pM (1-pM) from the number 1, iii) exclusion power (PE), which is defined as the proportion of individuals whose DNA profile differs from a randomly selected individual in a typical paternity case, and iv) the degree of heterozygosity (H), or a measure of standard genetic diversity. The informativeness of genetic markers increases with increasing degree of heterozygosity. However, when the sample was collected from a genetically isolated population, a lower total number of alleles and a smaller number of alleles per locus can be expected to occur than when collected from a genetically heterogeneous population. Although the degree of heterozygosity increases with increasing number of alleles, it depends on its frequency. Regardless of the number of alleles at a locus, the degree of heterozygosity is maximal when the frequencies of the alleles are equal. Another forensic parameter used in the analysis is the degree of polymorphism (PIC), another measure of the informativeness of genetic markers. The analysis of all the above forensic parameters was performed using the statistical package PowerStats v1.2 (25).

Data standardization (Z-values)
In order to compare the forensic parameters of the subpopulations of Croatia and the population of Southeast Europe through the differences of the mean values for real and "simulated" samples, the original values were transformed into standardized or Z-values. The standardized value was calculated by determining the deviation of an entity from the arithmetic mean. Thus, the standardized value is a relative measure of deviation of each entity from the arithmetic mean expressed in parts of the standard deviation (26).

Forensic parameters in Croatian subpopulations
The results of the forensic parameters analysis in eight different Croatian subpopulations are shown in Table 1 and   If we consider PIC as the second measure of genetic marker informativeness (Table 1)

Forensic parameters in Southeast european populations
The results of forensic parameters analysis in Southeast European populations are shown in Table 2 and Figures 6-10.

Z-values of the analyzed forensic parameters in Croatian subpopulations
The largest deviations of PM were observed at the loci TH01 and TPOX among participants from Korčula, and at the locus TPOX in the population of Vis ( Figure 1). The largest deviations of PD were observed at the TH01 and TPOX loci among participants from Korčula, followed by the population of Hvar, where deviations were observed at the D3S1358 and FGA loci. Deviations at the TPOX and D7S820 loci were observed in the Vis population, while deviations at the FGA, TH01, and D5S818 loci were observed in the NDI population ( Figure 2). The smallest and most uniform deviations were determined for PE ( Figure 3). However, when inspecting the observed deviations in detail, we observed that they were again largest in the populations of Korčula, Hvar, and Vis. The largest deviations of the PIC were again shown at the TPOX and TH01 loci in the population of Korčula, at the TPOX locus in the population of Vis, and at the D7S820 locus in the population of Krk and Vis ( Figure 4).
The largest average values of the four analyzed forensic parameters in the studied Croatian subpopulations were found in the population of Korčula, followed by the population of Hvar. The smallest deviations were found in the mainland population ( Figure 5).

Z-values of analyzed forensic parameters in Southeast european populations
The largest deviations of PM were observed at the D3S1358 locus in the populations of Romania and North Macedonia, and at the FGA locus in the population of Montenegro ( Figure 6). The largest deviations of PD were observed at the D3S1358 locus in the populations of Romania and North Macedonia and at the FGA locus in the population of Montenegro (Figure 7). The largest deviations of PE were found at the loci D5S818 and D13S317 in the population of Montenegro, and the loci TH01 and D7S820 in the population of Croatia ( Figure 8). The largest deviations of the PIC degree of polymorphism were found at the TH01 locus in the population of Slovenia, followed by the FGA locus in the population of Montenegro, Slovenia, Bosnia and Herzegovina, and Croatia ( Figure 9).
The average values of the four analyzed forensic parameters for all nine loci in the studied Southeast European populations are shown in Figure 10. The largest deviations were found in the population of North Macedonia, followed by Romania, Albanians from Kosovo, and Montenegro.

DiSCuSSioN
In our study, based on the conducted informativeness analyses of STR loci, the FGA locus was the most informative, while the TPOX locus was the least informative locus in most analyzed Croatian subpopulations. This finding confirms the results of previous research (9), which determined the highest genetic diversity (heterozygosity) for the FGA locus (83.9%-86.9%), and the lowest for the TPOX locus (50.9%-67.6%). Similar findings for these two loci have been confirmed in Southeast European populations and other similar studies (8,27).

influence of interpopulation relations on forensic parameters of StR markers
Although the degree of total genetic differentiation (FST) for the studied Croatian subpopulations was low, which indicates low substructuring, the difference between individual population groups cannot be ignored and should be taken into account when creating an appropriate population DNA database. Bearing in mind the structural complexity of Croatia, whose subpopulations still retain their distinctive features shaped by different ethno-historical migratory processes and lifestyles, it is necessary to determine the impact of small but significant substructuring levels in particular populations (28).
Croatia has numerous island populations with pronounced endogamy practices, which in small communities can lead to consanguinity (29). Diversity caused by substructuring due to isolation and blood relationship may be emphasized in specific populations, and may be important for forensic evaluation of DNA profiles (30,31). If such relationships exist within the population, it is necessary to develop more precise population DNA databases. In order to help create a DNA database that would best represent the substructuring of the Croatian population, we compared the forensic parameters of genetic STR loci calculated for each population group with a potential reference base obtained by random selection from the analyzed data. The largest deviations were observed in the population of the island of Korčula, followed by the island of Hvar. Since the previously described methods suggested that the population of Korčula was the most closed and isolated, the deviation of this population indicates that substructuring influenced the forensic parameters of the studied STR markers. On the other hand, the smallest deviations from the "random sample" were observed in the mainland population, as expected. The obtained results indicate the specificity of individual populations that should be taken into account when creating a reference DNA database of all subpopulations in Croatia. Such a database would be used to determine the frequency of alleles of different loci and to statistically calculate the "rarity" of a particular DNA profile within the general population (32,33).
The results of the forensic parameters analysis in all studied Croatian and Southeast European populations suggest the lowest informativeness of the TPOX locus, which had the highest PM in all analyzed populations, as well as the lowest H and PIC. On the other hand, the most informative locus was the FGA. The FGA was the most polymorphic locus with the highest degree of heterozygosity, as well as discriminant power and PE, and the lowest values of PM in most populations. The standard index of genetic diversity, as well as genetic diversity at the molecular level, also determined the lowest values at the TPOX and the highest at the FGA locus.
The average values of basic forensic parameters for all nine loci showed greater deviations at the level of Croatian subpopulations than at the level of Southeast European populations. In Croatia, the largest deviations were found in the population of the Korčula, followed by the population of Hvar, while the smallest deviations were found in the mainland population. Among Southeast European populations, the largest deviations were observed in the population of North Macedonia, followed by the populations of Romania, Albanians from Kosovo, Montenegro, Bosnia and Herzegovina, and Croatia.
The comparison of forensic parameters between different subpopulations of Croatia and Southeast Europe indicates that the isolation of individual Croatian subpopulations and certain rare alleles in their gene pool affect the values of forensic parameters. This should be taken into account in the sampling of the total population when creating a DNA database of STR markers. Individuals from such isolates must be represented in sufficient numbers to comply with the above rule, and rare alleles characteristic of Croatian subpopulations were included in the representative base used in this article. The main limitation of our study was using only 9 STR loci, because it was the usual available raw data. Furthermore, we were limited to the selected population data from the Southeastern European region that were published at the time and could not use all the population data. However, our results confirm that the influence of structuring of the two hierarchal levels assessed in our research could be determined with this number of loci. Additional research should further explain the influence of more STR of loci on substructuring, as well as perform the comparison with larger population data sets.